巴西专利BR112015021520B1 APPARATUS AND METHOD FOR CREATING ONE OR MORE AUDIO OUTPUT CHANNEL SIGNALS DEPENDING ON TWO OR MORE

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
apparatus and method for direct-environment multichannel decomposition for audio signal processing. an apparatus is provided to create one or more audio output channel signals depending on two or more audio input channel signals. each of the two or more signals of the audio input channel comprises parts of the direct signal and parts of the ambient signal. The apparatus comprises a filter determination unit (110) for determining a filter by estimating a first power spectral density information and estimating a second power spectral density information. further, the apparatus comprises a signal processor (120) for creating one or more audio output channel signals by applying the filter to the two or more audio input channel signals. the first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts of the ambient signal. of the two or more audio input channel signals. or, the first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts. of the direct signal from the two or more audio input channel signals. or, the first power spectral density information indicates power spectral density information about the direct signal portions of the two or more audio input channel signals, and the second power spectral density information indicates spectral density information of power over the ambient signal parts of the two or more audio input channel signals. fig. 1
公开号:BR112015021520B1
申请号:R112015021520-3
申请日:2013-10-23
公开日:2021-07-13
发明作者:Christian Uhle；Emanuel Habets；Patrick GAMPP；Michael KRATZ
申请人:Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V；
IPC主号:

专利说明:

[001] The present invention relates to an apparatus and method for direct decomposition-multichannel environment for audio signal processing.
[002] Audio signal processing is increasingly important. In this field, the separation of sound signals into direct and ambient sound signals plays an important role.
[003] In general, acoustic sounds consist of a mixture of direct sounds and ambient (or diffuse) sounds. Direct sounds are emitted by sound sources, eg a musical instrument, a vocalist or a loudspeaker, and arrive via the shortest possible path to the receiver, eg. the listener's ear input or the microphone.
[004] When a direct sound is heard, it is perceived as coming from a direction from the source of the sound. The relevant audience suggests that location and other properties of spatial sound are the interaural level difference, the interaural time difference, and the interaural coherence. Direct sound waves, which evoke the identical interaural level difference and the identical interaural time difference, are perceived as coming from the same direction. In the absence of diffuse sound, the signals that reach the left and right ear or any other multitude of sensors are coherent.
[005] Ambient sounds, on the other hand, are emitted by many spaced sound sources or sound reflective limits that contribute to the same ambient sound. When a sound wave hits a wall in a room, a part of it is reflected, and the superposition of all reflections in a room, reverberation, is a prominent example of ambient sound. Other examples are audience sounds (eg applause), ambient sounds (eg rain) and other background sounds (eg murmurs). Ambient sounds are perceived as being diffuse, not locatable and evoke an impression of involvement (of being "immersed in the sound") by the listener. When capturing an ambient sound field using a multitude of spaced sensors, the recorded signals are at least partially incoherent.
[006] Various sound reproduction and post-reproduction applications benefit from a decomposition of audio signals into direct signal components and ambient signal components. The main challenge for this signal processing is to achieve high separation while maintaining high sound quality for an arbitrary number of input channel signals and for all possible input signal characteristics. Direct ambient decomposition (DAD), ie the decomposition of audio signals into direct signal components and ambient signal components, allows separate reproduction or modification of the signal components, which is for example intended for up-mixing of the audio signals.
[007] The term upmixing refers to the process of creating a signal with P channels that results in an input signal with N channels, where P > N. Its main application is the reproduction of audio signals that use sound settings that have more channels than available in the input signal. Playing content using advanced signal processing algorithms allows the listener to use all available channels of the multi-channel sound reproduction setup. Such processing can break down the input signal into significant signal components (eg based on their perceived position in the stereo image, direct sounds versus ambient sounds, single instruments) or into signals where these signal components are attenuated or boosted .
[008] Two upmixing concepts are widely known. 1. Upmixing guided: upmixing with additional information guiding the upmixing process. Additional information can be "encoded" in a specific way into the input signal or can be additionally stored. 2.Unguided Upmix: The output signal is taken from the input audio signal exclusively without any additional information.
[009] Advanced upmixing methods regarding the positioning of direct signals and environment can be further categorized. It distinguishes between the “direct/environmental approach” and the “in-band” approach. The core component of live/ambient based techniques is the extraction of an ambient signal that is powered by eg. for the rear channels or the height channels of a multi-channel surround sound setup. Ambient reproduction using the rear or height channels evokes an impression of involvement (becoming "immersed in the sound") by the listener. Additionally, direct sound sources can be distributed across the front channels according to their perceived position in stereo panorama. In contrast, the "in band" approach aims to position all sounds (direct sound as well as ambient sounds) around the listener using all available speakers.
[010] Decomposing an audio signal into direct and ambient signals also allows the separate modification of ambient sounds or direct sounds, eg. scaling or filtering them. A use case is processing a recording of a musical performance that was captured with too high an amount of ambient sound. Another use case is audio production (eg for movie sound or music), in which the audio signals captured at different locations are combined and therefore with different ambient sound characteristics.
[011] In any case, the requirements for such signal processing are to achieve a high separation and at the same time maintain a high sound quality for an arbitrary number of input channel signals and for all possible input signal characteristics .
[012] Various approaches have been provided in the prior art for DADou to attenuate or boost either the direct signal components or the ambient signal components, which will be briefly summarized below.
[013] There are known concepts that refer to speech signal processing in order to remove unwanted background noise from microphone recordings.
[014] A method to attenuate the reverberation of speech registers that have two input channels is described in [1]. Reverb signal components are reduced by attenuating uncorrelated (or diffuse) signal components in the input signal. Processing is implemented in the time frequency domain, so that subband signals are processed using a spectral weighting method. Actual weighting factors are computed using spectral power densities (PSD)

[015] where X(m,k) and Y(m,k) are time frequency domain representations of the time domain input signals xt[n] and yt[n], Ef} is the expect operation and X* is the complex conjugate of X.
[016] The original authors emphasize that the different spectral weighting functions are viable when proportional to *A<y(m,k), eg. when using weights equal to the normalized cross-correlation function (or coherent function)

[017] Following a similar logic, the method description in [2] extracts an ambient signal using spectral weighting with weights derived from the normalized computerized cross-correlation function in frequency bands, according to Formula (4) (or with the authors' words originals, the "interchannel short-term coherence function"). The difference compared to [1] is that instead of attenuating the diffuse signal components, we attenuate the direct signal components using the spectral weights which are a monotonic ready function of ( 1 - p(m, k) ).
[018] The decomposition for the application of up-mixing of two-channel input signals using multi-channel Wiener filtration was described in [3]. Processing is performed in the time frequency domain. The input signal is modeled as a mixture of the ambient signal and a direct source active source (by frequency band), where the direct signal on one channel is constrained to be a scaled copy of the direct signal component on the second channel. is the amplitude balance. The balance coefficient and the direct signal and ambient signal strengths are estimated using the normalized cross-correlation correlation and the input signal strengths in both channels. The direct output signal and ambient output signals are derived from linear combinations of the input signals, with real value weighting coefficients. Additional postscaling is applied so that the power of the output signals equals the estimated quantities.
[019] The method described in [4] extracts an ambient signal that uses spectral weighting, based on an estimate of the ambient power. The ambient power is estimated based on the assumptions that the forward signal components on both channels are fully correlated, that the ambient channel signals are uncorrelated with each other and with the forward signals, and that the ambient powers on both the channels are the same.
[020] A method for upmixing stereo signals that is based on Directional Audio Coding (DirAC) is described in [5]. DirAC intends to analyze and reproduce the direction of arrival, diffusion and spectrum of the sound field. For upmixing the stereo input signals, anechoic format B registers of the input signals are simulated.
[021] A method for extracting uncorrelated reverb from stereo audio signal, which uses an adaptive filter algorithm, which intends to predict the direct signal component in one channel signal using the other channel signal through an algorithm Minimum Mean Square (LMS), is described in [6]. Subsequently the ambient signals are derived by subtracting the estimated direct signals from the input signals. The logic of this approach is that the prediction only works for correlated signals and that the prediction error is similar to the uncorrelated signal. Several adaptive filter algorithms based on the LMS principle exist and are feasible, eg. the LMS algorithm or the Normalized LMS algorithm (NLMS).
[022] For the decomposition of input signals with more than two channels, a method is described in [7] in which the multi-channel signals are first downmixed to obtain a 2-channel stereo signal and subsequently a method for processing signals is applied. stereo input presented in [3].
[023] To process mono signals, the method described in [8] extracts an ambient signal using spectral weighting, in which spectral weights are computed using characteristic extraction and supervised learning.
[024] Another method for extracting an ambient signal from the demon registers for applying up-mix obtains the time frequency domain representation from the difference of the time frequency domain representation of the input signal and a version compressed from this, preferably computerized using non-negative matrix factorization [9].
[025] A method for extracting and changing the reverberant signal components in an audio signal based on estimating the magnitude transfer function of the reverberant system that created the reverberant signal is described in [10]. An estimate of the magnitudes of the frequency domain representation of the signal components is derived through recursive filtering and can be modified.
[026] The objective of the present invention is to provide improved concepts for the decomposition of the multichannel direct environment for the processing of the audio signal. The object of the present invention is achieved by an apparatus according to claim 1, by a method according to claim 14 and by a computer program according to claim 15.
[027] A device is provided to create one or more audio output channel signals depending on two or more audio input channel signals. Each of the two or more audio input channel signals comprises parts of the direct signal and parts of the ambient signal. The apparatus comprises a filter determination unit for determining a filter by estimating a first power spectral density information and estimating a second power spectral density information. Furthermore, the apparatus comprises a signal processor for creating one or more audio output channel signals by applying the filter to the two or more audio input channel signals. The first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts of the ambient signal. of the two or more audio input channel signals. Or, the first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts. of the direct signal from the two or more audio input channel signals. Or, the first power spectral density information indicates power spectral density information about the direct signal portions of the two or more audio input channel signals, and the second power spectral density information indicates spectral density information of power on the ambient signal parts of the two or more audio input channel signals.
[028] The models provide concepts for the decomposition of audio input signals into direct signal components and ambient signal components, which can be applied to post-production and sound reproduction. The main challenge for this signal processing is to achieve high separation while maintaining high sound quality for an arbitrary number of input channel signals and for all possible input signal characteristics. The concepts provided are based on multichannel signal processing in the frequency domain of time which leads to an optimized solution constrained in the direction of the mean squared error, and e.g. subject to restrictions on the distortion of the estimated desired signals or on the reduction of residual interference.
[029] Templates for decomposing audio input signals into direct signal components and ambient signal components are provided. In addition, a derivation of filters is provided to computerize the components of the ambient signal and, in addition, models for filter applications are described.
[030] Some models refer to unguided up-mix according to the direct/ambient approach with input signals with more than one channel.
[031] For the foreseen applications of the described decomposition, one is interested in computing output signals with the same number of channels as the input signal. For this application, the models provide very good results in terms of separation and sound quality because they support input signals where direct signals are temporally delayed between input channels. Unlike other concepts, eg. In the concepts given in [3], the models do not assume that the direct sounds in the input signals are balanced by simple scaling (amplitude balance), but also by introducing time differences between the direct signals in each channel.
[032] Furthermore, the models are capable of operating an input signal with an arbitrary number of channels, unlike all other concepts in the prior art (see above) that can only process input signals with one or two channels.
[033] Other advantages of models are the use of control parameters, estimation of the ambient PSD matrix and other filter modifications as described below.
[034] Some models provide consistent ambient sounds for all input sound objects. When input signals are decomposed into direct and ambient sounds, some models adapt the ambient sound characteristics by processing the appropriate audio signal, and other models substitute the ambient signal components through artificial reverberation and other artificial ambient sounds.
[035] According to a model, the apparatus may further comprise an analysis filter bank that is configured to transform the two or more audio input channel signals from a time domain into a time frequency domain. The filter determining unit can be configured to determine the filter by estimating the first power spectral density information and the second power spectral density information depending on the signals of the audio input channel, which is represented in the frequency domain. of time. The signal processor can be configured to create one or more signals from the audio output channel, which is represented in a time frequency domain, by applying the filter to two or more signals from the audio input channel, which is represented in the time frequency domain. In addition, the apparatus may further comprise a synthesis bank filter which is configured to transform one or more signals from the audio output channel, which is represented in a time frequency domain, from the time frequency domain into the audio domain. time.
[036] A method for creating one or more audio output channel signals depending on two or more audio input channel signals is further provided. Each of the two or more audio input channel signals comprises parts of the direct signal and parts of the ambient signal. The method comprises:
[037] - determine a filter by estimating a first information of the power spectral density and estimating a second information of the power spectral density. AND:
[038] - Create one or more audio output channel signals by applying the filter to two or more audio input channel signals.
[039] The first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts of the ambient signal of the two or more signals of the audio input channel. Or, the first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts. of the direct signal from the two or more audio input channel signals. Or, the first power spectral density information indicates power spectral density information about the direct signal portions of the two or more audio input channel signals, and the second power spectral density information indicates spectral density information of power on the ambient signal parts of the two or more audio input channel signals.
[040] In addition, a computer program is provided to implement the above-described method when running on a computer or on a signal processor.
[041] We will describe models of the present invention in detail, referring to the figures, in which:
[042] Fig. 1 illustrates an apparatus for creating one or more audio output channel signals depending on two or more audio input channel signals according to a model,
[043] Fig. 2 illustrates input and output signals from the decomposition of a 5-channel classical music record, with input signals (left column), ambient output signals (center column) and direct output signals (right column ) according to a template,
[044] Fig. 3 presents a basic overview of the decomposition using the estimation of the ambient signal and the estimation of the direct signal according to a model,
[045] Fig. 4 shows a basic overview of the decomposition using the estimation of the direct signal according to a model,
[046] Fig. 5 illustrates a basic overview of the decomposition using the estimation of the ambient signal according to a model,
[047] Fig. 6a illustrates an apparatus according to another model, in which the apparatus further comprises an analysis filter bank and a synthesis filter bank, and
[048] Fig. 6b presents a device according to another model, which illustrates the extraction of direct signal components, in which the AFB block is a set of N analysis filter banks (one for each channel), and in which SFB is a set of synthesis bank filters.
[049] Fig. 1 illustrates an apparatus for creating one or more audio output channel signals depending on two or more audio input channel signals according to a model. Each of the two or more audio input channel signals comprises parts of the direct signal and parts of the ambient signal.
[050] The apparatus comprises a filter determination unit 110 for determining a filter by estimating a first power spectral density information and estimating a second power spectral density information.
[051] Furthermore, the apparatus comprises a signal processor 120 for creating one or more audio output channel signals by applying the filter to the two or more audio input channel signals.
[052] The first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the parts of the ambient signal of the two or more signals of the audio input channel.
[053] Or, the first power spectral density information indicates power spectral density information about the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information about the direct signal parts of the two or more audio input channel signals.
[054] Or, the first power spectral density information indicates power spectral density information about the direct signal portions of the two or more audio input channel signals, and the second power spectral density information indicates power spectral density information. power spectral density on the ambient signal parts of the two or more audio input channel signals.
[055] Models are described that provide concepts for the decomposition of audio input signals into direct signal components and ambient signal components, which can be applied to post-production and sound reproduction. The main challenge for this signal processing is to achieve high separation while maintaining high sound quality for an arbitrary number of input channel signals and for all possible input signal characteristics. The models provided are based on multichannel signal processing in the time frequency domain and provide an optimized solution in the direction of the mean square error subject to restrictions on the distortion of the estimated desired signals or on the reduction of residual interference.
[056] Firstly, inventive concepts are described, on which the models of the present invention are based.
[057] It is assumed that N signals are received from the input channel yt[n]:
[058]

[059] For example, N ≥ 2. The purpose of the given concepts is to decompose the input channel signals y1[n] ... yN[n] ( = [yt[n]]T ) into N direct signal components represented by dt[n] = [d1[n] ... dN[n]]T and/or N components of the ambient signal represented by t[n] = [a1[n] ... aN[n]]T. Processing can be applied to all input channels or the input signal channels are divided into subsets of channels that are processed separately.
[060] According to some models, one or more of the direct signal components d1[n], ..., dN[n] and/or one or more of the ambient signal components a1[n], ..., aN[ n] must be estimated from one or more signals of the input channel y1[n], ..., yN[n] to obtain one or more estimates

of the direct signal components d1[n], ..., dN[n] and/or of the ambient signal components a1[n], ..., aN[n] as one or more output channel signals.
[061] An example of the outputs provided by some models is shown in Fig. 2, for N = 5. One or more signals from the output channel
are obtained by estimating the direct signal components and the ambient signal components independently, as shown in Fig. 3. Alternatively, an estimate (dz [ n ] or t [ n ]) for one of the two signals (let dt [ n ] ] or at[n]) is computed and the other sign is obtained by subtracting the first result from the input signal. Fig. 4 illustrates the processing to estimate the direct signal components dt[n] first and derive the ambient signal components at[n] by subtracting the estimate of the direct signals from the input signal. With similar logic, the estimation of the ambient signal components can be derived first as illustrated in the block diagram in Fig. 5.
[062] According to the models, the processing can, for example, be performed in the time frequency domain. A time-frequency domain representation of the input audio signal can, for example, be obtained via a filter bank (the analysis filter bank), e.g. the Short Term Fourier Transform (STFT).
[063] According to a model illustrated by Fig. 6a, an analysis bank filter 605 transforms the audio input channel signals yt[n] from the time domain into the time frequency domain. Furthermore, in Fig. 6a, a synthesis filterbank 625 transforms the estimate of the direct signal components d[m,1],...,d[m, k] from the time-frequency domain in the domain. of time, to get the signals from the audio output channel

[064] In the model of Fig. 6a, the analysis filter bank 605 is configured to transform the two or more audio input channel signals from a time domain into a time frequency domain. The filter determining unit 110 is configured to determine the filter by estimating the first power spectral density information and the second power spectral density information depending on the audio input channel signals, which are represented in the frequency domain. of time. Signal processor 120 is configured to create one or more audio output channel signals, which are represented in a time frequency domain, by applying the filter to two or more audio input channel signals, which are represented in the time frequency domain. Synthesis bank filter 625 is configured to transform one or more audio output channel signals, which are represented in a time frequency domain, from the time frequency domain into the time domain.
[065] A time frequency domain representation comprises a certain number of subband signals that evolve over time. Adjacent subbands can optionally be linearly combined into larger subband signals to reduce computational complexity. Each subband of input signals is processed separately, as described in detail below. The time domain output signals are obtained by applying the inverse processing of the filter bank, ie, the synthesis bank filter, respectively. All signals presumably have a mean of zero, the time frequency domain signals can be modeled as complex random variables.
[066] The following are definitions and assumptions.
[067] The following definitions are used throughout the description of the developed method: The time frequency domain representation of a multichannel input signal with N channels is given by

[068] with time index m and subband index k, k = 1 ... K is assumed to be an additive mixture of the direct signal component d(m, k) and the ambient signal component a(m, k),
[069] this is

[070] with

[071] where Di(m,k) represents the direct component and Ai(m,k) the ambient component in the /th channel.
[072] The objective of direct environment decomposition is to estimate d(m,k) and ea(m,k). Output signals are computed using HD(m,k) or HA(m,k) filter matrices or both. Filter matrices are of size N x N and are of complex value, or they can, in some models, eg. be of real value. An estimate of the N channel signals of direct signal components and ambient signal components is obtained from

[073] Alternatively, only one filter matrix can be used, and the subtraction illustrated in Fig. 4 can be expressed as such

[074] where I is the identity matrix of size N x N, or, as shown in Fig. 5, so

[075] respectively. Here, the exponent H represents the conjugate transposition of a matrix or a vector. The HD(m,k) filter matrix is used to computerize estimates for the direct signals d(m,k) . The filter matrix HA(m,k) is used to compute estimates for the ambient signals a(m,k).
[076] Above, Formulas (10) - (15), y( m,k) indicates two or more audio input channel signals. a(m,k) indicates an estimate of the ambient signal parts and d(m,k) indicates an estimate of the direct signal parts of the audio input channel signals, respectively. a(m,k) and/or d(m,k) One or more vector components of a(m,k) and/or d(m,k) can be from one or more audio output channel signals .
[077] One, some or all of Formulas (10), (11), (12), (13), (14) and (15) may be employed by the signal processor 120 of Fig. 1 and Fig. 6a to apply the filter of Fig. 1 and Fig. 6a to the audio input channel signals. The filter of Fig. 1 and Fig. 6a can, for example, be HD(m,k), HA(m,k), HH(m,k), HH(m,k), [I - HD(m ,k)] or [I - HA(m,k)]. In other designs, however, the filter, determined by filter determination unit 110 and employed by signal processor 120, cannot be a matrix but can be any other type of filter. For example, in other models, the filter may comprise one or more vectors that define the filter. In other models, the filter may comprise several coefficients that define the filter.
[078] The filtration matrices are computed from statistical estimates of the signal as described below.
[079] In particular, the filter determination unit 110 is configured to determine the filter by estimating the first power spectral density (PSD) information and the second PSD information.
[080] Defines:

[081] where Ef} is the expectation operator and X* represents the complex conjugate of X. It is obtained for i = jo PSD and for i + j the crossed PSDs.
[082] The covariance matrices for y(m,k), d(m,k) and a(m,k) are

[083] The covariance matrices Φy(m,k), Φd(m,k) and Φa(m,k) comprise PSD estimates for all channels on the main diagonal, while the off-diagonal elements are cross PSD estimates of the respective channel signals. Therefore, each of the matrices Φy(m,k), Φd(m,k) and Φa(m,k) represents an estimate of the power spectral density information.
[084] In Formulas (17) - (19), Φy(m,k) indicates a power spectral density information about the two or more signals of the audio input channel. Φd(m,k) indicates power spectral density information about the direct signal components of the two or more audio input channel signals. Φa(m,k) indicates power spectral density information about the ambient signal components of the two or more audio input channel signals.
[085] Each of the matrices Φy(m,k), Φd(m,k) and Φa(m,k) of Formulas(17), (18) and (19) can be considered information of the power spectral density. However, note that in other models, the first and second power spectral density information is not a matrix, but can be represented in any other suitable format. For example, according to models, the first and/or second power spectral density information can be represented as one or more vectors. In other models, the first and/or second power spectral density information can be represented as several coefficients.
[086] It is assumed that
[087] Di(m,k) and Ai(m,k) are mutually uncorrelated:

[088] Ai(m,k) and Aj(m,k) are mutually uncorrelated:

[089] The ambient power is the same on all channels:

[090] As a consequence, it is observed that

[091] As a consequence of Formula (20) it is observed that when two matrices of the matrices Φy(m,k), Φd(m,k) and Φa(m,k) are determined, the third of the matrices is immediately available. Another consequence is that just determine:
[092] - power spectral density information about two or more signals of the audio input channel, and power spectral density information about parts of the ambient signal of the two or more signals of the audio input channel, or
[093] - power spectral density information about two or more audio input channel signals, and power spectral density information about direct signal portions of the two or more audio input channel signals, or
[094] - power spectral density information about the direct signal parts of the two or more audio input channel signals, and power spectral density information about the ambient signal parts of the two or more audio input channel signals ,
[095] because the third power spectral density information (which was not estimated) becomes immediately evident from the relationship of the three types of power spectral density information (eg, via Formula (20) or via another is any reformulation of the relationship of the three types of power spectral density information (full input signal PSD, ambient component PSD and direct component PSD), when these three types of PSD information are not represented as matrices, but when they are available in another suitable representation type, eg as one or several vectors, or eg as a series of coefficients, etc.
[096] To assess the performance of the developed method, the following signals are defined:
[001] Direct signal distortion:

[002] Residual ambient signal:

[003] Distortion of the ambient signal:

[004] Residual direct signal:

[097] Next, the derivation of the filter matrices is described below, according to Fig. 4 and according to Fig. 5. For better readability, the sub-band indices and the time indices are discarded.
[098] Firstly, the models for the estimation of the direct signal components are described.
[099] The logic of the developed method is to computerize the filters, so that the residual ambient signal ra is minimized while restricting the distortion of the direct signal qd. This leads to the constrained optimization problem.

[100] where admx is the maximum allowable direct signal distortion. The solution is given by

[101] The filter for computing the direct output signal of the 1st channel is equal to

[102] where ui is a null vector of length N with 1 in the i.th position. The βi parameter allows a compromise between the reduction of the residual ambient signal and the distortion of the ambient signal. For the system shown in Fig. 4, lower levels of residual environment in the direct output signal lead to higher levels of environment in the output signals from the environment. Less distortion of the direct signal leads to better attenuation of direct signal components in ambient output signals. The time and frequency dependent parameter βi can be set separately for each channel and can be controlled by the input signals or signals derived therefrom; as described below.
[103] Note that a similar solution can be obtained by formulating the constrained optimization problem as

[104] When Φd is of degree one, the relation between admx and βi for the ith channel signal is derived like this

[105] where Φ^ is the PSD of the direct signal on the 1st channel, and A is the direct/multichannel environment ratio (DAR)

[106] where the trace of a square matrix A is equal to the sum of the elements on the main diagonal,

[107] Note that the statement that Φd is of degree one is only a presumption. Whether in reality this presumption is true or not, the models of the present invention employ the above Formulas (26), (27) and (28), even in situations where, in reality, the exact result of Φd is such that Φd is not grade one. In these situations, the models of the present invention also provide good results, even when the presumption that Φd is grade one is not actually true.
[108] An estimate of the components of the environmental signal is described below.
[109] The logic of the developed method is to computerize the filters, so that the residual direct signal rd is minimized while restricting the distortion of the direct signal qa. This leads to the constrained optimization problem.

[110] where a^^ is the maximum allowable ambient signal distortion. The solution is given by

[111] The filter for computing the ambient output signal of the 1st channel is equal to

[112] The following are detailed models that carry out the concepts of the present invention.
[113] To determine power spectral density information, for example, the PSD matrix of audio input channel signals Φy can be estimated directly using short-term motion averaging or recursive averaging. In the environment PDS matrix Φa , it can, for example, be estimated as described below. The direct PSD matrix Φd , for example, can then be obtained using Formula (20).
[114] Next, it is again assumed that no more than one direct sound source is active at the same time in each subband (single direct source), and that consequently Φd is grade one.
[115] Note that statements that no more than one direct sound source is active and that Φd is grade one are just assumptions. It does not matter whether in reality these presumptions are true or not, the models of the present invention employ the Formulas below, in particular Formulas (32) and (33), even in situations where, in reality, more than one source of direct sound is active e, and even when, the exact result of Φd is such that Φd is not of degree one. In these situations, the models of the present invention also provide good results, even when the assumptions that no more than one direct sound source is active, and that Φd is grade one, are actually not true.
[116] Therefore, assuming that no more than one direct sound source is active, and that Φd is of degree one, Formula (23) can be expressed like this

[117] Formula (33) provides a solution to the bounded optimization problem in Formula (22).
[118] In the above Formulas (32) and (33), Φ “1 is the inverse matrix of Φa. It is evident that Φ“1 also indicates power spectral density information about the ambient signal parts of the two or more audio input channel signals.
[119] To determine HDβ), Φ“1 and Φd have to be determined. When Φa is available, Φ“1 can be immediately determined. À is defined according to Formulas (27) and (28) and its value is available when Φ“1 and Φ are available. In addition to determining Φ“1 , Φ and À, a suitable value for βi must be chosen.
[120] In addition, Formula (33) may be reworded (see Formula (20)) so that:

[121]and thus so that only the PSD information Φy about the audio input channel signals and the PSD information Φ about the direct signal parts of the audio input channel signals have to be determined.
[122] In addition, Formula (33) may be reformulated (see Formula (20)) so that:

[123] and, therefore, so that only the PSD information Φa1 about the ambient signal parts of the audio input channel signals and the PSD information Φ about the direct signal parts of the input channel signals has to be determined audio.
[124] In addition, Formula (33) can be reworded so that:

[125]and, therefore, in order to determine HA(βi) .
[126] Formula (33c) provides a solution to the limited optimization problem of Formula (29).
[127] Similarly, Formulas (33a) and (33b) can be reformulated to:

[128] or to:

[129] Note that when determining H (β) , the filter H (β) is immediately available as: HA(βi) = I -H (β) .
[130] Furthermore, note that when determining HA(βi) , the filter H (β) is immediately available as: HD(βi) = INXN -HA(βi) .
[131] As stated above, to determine H(β) , e.g. according to Formula (33), Φy and Φa can be determined:
[132] The PSD matrix of Audio Signals Φy (m,k) can, for example, be estimated directly, eg using a recursive mean
[133]

[134] where α is a coefficient of the filter that determines the integration time, or
[135] for example, using short-term moving weighted average

[136] where L is, eg, the number of passed values used for computing the PSD, and b0 ... bL are the filter coefficients that are, for example, in the range of [0 1] (eg. , 0 ≤ filter coefficient ≤ 1), or
[137] for example, using short-term moving average, according to Equation (34b) but with
for all i = 0... L.
[138] We now describe the estimation of the ambient PSD matrix according to models.
[139] The ambient PSD matrix Φa is given by

[140] where IWxAf is the identity matrix of size N x N . $A is eg a number.
[141] A solution according to a model is, for example, obtained using a constant value, using Formula (21) and defining ΦA for a real positive constant ε. The advantage of this approach is that the computational complexity is negligible.
[142] On models, the filter determination unit 110 is configured to determine ΦA depending on the two or more audio input channel signals.
[143] An option with very low computational complexity is, according to a model, to use a fraction of the input power and set ΦA to the mean or minimum value of the input PSD or a fraction thereof, eg.

[144] where parameter g controls the amount of ambient power, e0 < g < 1
[145] According to another model, an estimate is conducted based on the arithmetic mean. Taking into account the presumption that leads to Formula (20) and Formula (21), it can be seen that the PSD ΦA can be computerized using

[146] While tr{Φy } can be directly computerized using eg. the recursive integration of Formula (34a), or eg the weighted short-term moving average of Formula (34b), tr{Φd } is estimated as

[147] Alternatively, the PSD fˆA(m,k) can be computed to N > 2 by choosing two signals from the input channel and estimating fˆA(m,k) only for a pair of signal channels. More accurate results are obtained when this procedure is applied to more than one pair of input channel signals and when the results are combined, e.g. through the average of the general estimates. The subsets can be chosen, taking advantage of a previous knowledge of the channels with similar ambient power, eg. estimating ambient power separately in all rear channels and all front channels of a 5.1 register.
[148] In addition, note that from Formulas (20) and (35), comes the following

[149] According to some models, Φ is determined by determining ΦA (eg, according to Formula (35) or Formula (36) or according to Formulas (37) - (40) ) and by using the Formula (35a) for obtaining power spectral density information about the ambient signal parts of the audio input channel signals. Then, H D (βi) can be determined, for example, by using Formula (33a).
[150] Next, the choice for the βi parameter is considered.
[151] βi is a trade-off parameter. The trade-off parameter βi is a number.
[152] In some models, only one tradeoff parameter βi that is valid for all audio input channel signals is determined, and this trade-off parameter is then considered as the trade-off information of the channel signals audio input.
[153] In other models, a trade-off parameter βi is determined for each of the two or more audio input channel signals, and these two or more compromise parameters of the audio input channel signals then together form the trade-off information.
[154] In other models, trade-off information may not be represented as a parameter, but it may be represented in a different type of suitable format.
[155] As noted above, the βi parameter allows for a compromise between the reduction of the residual ambient signal and the distortion of the direct signal. It can be chosen or it can be constant, or signal dependent, as seen in Fig. 6b.
[156] Fig. 6b illustrates an apparatus according to another model. The apparatus comprises an analysis bank filter 605 which transforms the audio input channel signals yt[n] from the time domain into the time frequency domain. In addition, the apparatus further comprises a synthesis bank filter 625 for transforming one or more audio output channel signals (eg the estimated direct signal components d[n],...,d[n] of the audio input channel signals) from the time frequency domain in the time domain.
[157] Various units of determination K beta 1111, ..., 11K1(“computerize Beta”) determine the parameters βi . In addition, several computation units of K subfilters 1112, ., 11K2 determine subfilters HH(m,1),...,HH(m,K) . The various beta determination units 1111, ., 11K1 and the various subfilter computing units 1112, ., 11K2 together form the filter determination unit 110 of Fig. 1 and Fig. 6a according to a particular embodiment . The various subfilters HH(m,1),...,HH(m,K) together form the filter of Fig. 1 and Fig. 6a according to a particular model.
[158] Furthermore, Fig. 6b illustrates several subprocessors of signal 121,...,12K, where each subprocessor of signal 121,...,12K is configured to apply only one of the HH(m,1) subfilters ,...,HH(m,K) on one of the audio input channel signals to get one of the audio output channel signals. The various signal subprocessors 121,...,12K together form the end processor of Fig. 1 and Fig. 6a according to a particular model.
[159] Below, different use cases are described to control the βi parameter through signal analysis.
[160] First, transient signals are considered.
[161] According to one model, the filter determination unit 110 is configured to determine the trade-off information (βi, βj) depending on whether a transient is present in at least one of the two or more signals of the input channel. audio.
[162] The input PSD matrix estimate works best for the permanent signal. On the other hand, decomposition of the transient input signal can result in leakage of the transient signal component to the ambient output signal. Controlling βi through an analysis of the signal relative to the degree of probability of non-permanent or transient presence, so that βi is lower when the signal comprises transients and so that it is higher in sustained parts, results in output signals more consistent when HD(βi) filters are applied. Controlling βi through an analysis of the signal relative to the degree of probability of non-permanent or transient presence, so that βi is higher when the signal comprises transients and so that it is lower in sustained parts, results in output signals more consistent when HA(βi) filters are applied.
[163] Unwanted ambient signals are now considered.
[164] In one model, the filter determination unit 110 is configured to determine trade-off information (βi,, βj) depending on a presence of additive noise in at least one signal channel, through which a of the two or more audio input channel signals.
[165] The proposed method decomposes the input signals independently of the nature of the ambient signal components. When input signals have been transmitted through noisy signal channels, it is advantageous to estimate the probability of unwanted presence of additive noise and control βi so as to increase the output DAR (direct/ambient ratio).
[166] The control of output signal levels is now described.
[167] To control the levels of the output signals, βi can be set separately for the 1st channel. The filters for computing the ambient output signal of the 1st channel are given by Formula (31).
[168] For any two channels, βi can be computed on the basis of βi such that the PSDs of the residual ambient signals ra,i and ra,j in the i. and j. output channel are equal to, i.e.,

[169] or

[170] Alternatively, βi can be computed so that the PSDs of the ambient output signals ai and aj are equal for all pairs i and j.
[171] It is now considered to use balance information.
[172] For the case of two input channels, balancing the information quantifies the level differences between both channels per subband. Balance information can be applied to control βi so as to control the perceived width of the output signals.
[173] Next, it is considered equalizing the output ambient channel signals.
[174] The processing described does not guarantee that all signals in the output environment channel have equal subband powers. To ensure that all ambient output channel signals have equal subband powers, the filters are modified as described below for the model that uses HD filters as described above. The covariance matrix of the ambient output signal (comprising the auto-PSDs of each channel on the main diagonal) can be obtained as

[175] To ensure that the PSDs of all ambient output channels are equal, HD filters are replaced by HD :

[176] where G is a diagonal matrix, whose elements on the main diagonal are

[177] For the model using tHA filters as described above, the covariance matrix of the ambient output signal (which comprises the auto-PSDs of each channel on the main diagonal) can be obtained as

[178] To ensure that the PSDs of all ambient output channels are the same, the HA filters are replaced with HA :

[179] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, in which a block or device corresponds to a method step or a characteristic of a method step. Similarly, aspects described in the context of a method step also represent a description of a corresponding block or item or feature of a corresponding apparatus.
[180] The decomposed signal of the invention can be stored in a digital storage medium or it can be transmitted in a transmission medium such as a wireless transmission medium or a wired transmission medium such as the Internet.
[181] Depending on certain implementation requirements, the models of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example a floppy disk, a DVD, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, with electronic read control signals stored there, which cooperate (or are able to cooperate) with a programmable computer system so that the respective method is executed.
[182] Some models according to the invention comprise a non-transient data carrier with electronically readable control signals, which are capable of cooperating with a programmable computer system, so that one of the methods described herein is executed.
[183] In general, the embodiments of the present invention can be implemented as a computer program product with a program code, the program code being operative to execute one of the methods when the computer program product runs on a computer . Program code can, for example, be stored in a machine-readable medium.
[184] Other models comprise the computer program to execute one of the methods described here, stored in a machine-readable support.
[185] In other words, a model of the method of the invention is therefore a computer program with program code for executing one of the methods described herein, when the computer program runs on a computer.
[186] Another embodiment of the methods of the invention is therefore a data carrier (or a digital storage medium or a computer readable medium) comprising, recorded therein, the computer program for executing one of the methods described herein.
[187] Another embodiment of the method of the invention is therefore a data stream or a sequence of signals representing the computer program for executing one of the methods described herein. The data stream or signal sequence can, for example, be configured to be transferred via a data communication link, for example via the Internet.
[188] Another model comprises a processing means, for example, a computer, or a programmable logic device, configured or adapted to perform one of the methods described herein.
[189] Another model comprises a computer with a computer program installed to execute one of the methods described here.
[190] In some models, a programmable logic device (eg a network of programmable logic gates) can be used to perform some or all of the functionality of the methods described here. In some models, a network of programmable logic gates can cooperate with a microprocessor to perform one of the methods described here. Generally speaking, the methods are preferably performed by any hardware device.
[191] The models described above are merely illustrative for the principles of the present invention. It is understood that modifications and variations to the arrangements and details described will be apparent to those skilled in the art. It is, therefore, intended to be limited only by the scope of the pending patent claims and not by the specific details of the description and explanation of the models contained herein.
[192] References [1] J.B. Allen, D.A. Berkeley, and J. Blauert, "Multimicrophone signal processing technique to remove room reverberation from speech signals", J.Acoust.Soc. Am., vol.62, 1977.[2] C. Avendano and J.-M. Jot, "A frequency-domain approach to multichannel upmix", J. Audio Eng. Soc., vol. 52, 2004.[3] C. Faller, "Multiple-loudspeaker playback of stereo signals", J. AudioEng. Soc., vol. 54, 2006.[4] J. Merimaa, M. Goodwin, and J.-M. Jot, "Correlation-based ambienceextraction from stereo recordings", in Proc. of the AES 123rd Conv., 2007.[5] Ville Pulkki, "Directional audio coding in spatial sound reproduction and stereo upmixing", in Proc. of the AES 28th Int. Conf., 2006.[6] J. Usher and J. Benesty, "Enhancement of spatial sound quality: Anew reverberation-extraction audio upmixer", IEEE Tram. on Audio, Speech. and Language Processing, vol.15, pp. 2141-2150, 2007.[7] A. Walther and C. Faller, "Direct-ambient decomposition and upmix of surround sound signals", in Proc. of IEEE WASPAA,2011.[8] C. Uhle, J. Herre, S. Geyersberger, F. Ridderbusch, A. Walter; walk. Moser, "Apparatus and method for extracting an ambient signal in an: apparatus and method for obtaining weighting coefficients for extracting an ambient signal and computer program", US Patent Application 2009/0080666, 2009.[9] C. Uhle, J. Herre, A. Walther, O. Hellmuth, and C. Janssen, "Apparatus and method for generating an ambient signal from an audio signal, apparatus and method for deriving a multi-channel audio signal from an audio signal and computer program", US Patent Application 2010/0030563, 2010. [10] G. Soulodre, "System for extracting and changing the reverberant content of an audio input signal", US Patent 8,036,767, Date of Patent: October 11, 2011.

权利要求:
Claims (14)
[0001]
1. Apparatus for creating one or more audio output channel signals depending on two or more audio input channel signals, wherein each of the two or more audio input channel signals comprises direct signal parts and parts of ambient signal, characterized in that the apparatus comprises: a filter determining unit (110) for determining a filter by estimating a first power spectral density information and estimating a second power spectral density information, wherein the filter relies on the first power spectral density information and the second power spectral density information, and a signal processor (120) to create one or more audio output channel signals by applying the filter to the two or more channel signals. of audio input, wherein the one or more signals of the audio output channel depend on the filter, wherein the filter determination unit (110) is configured to estimate the first power spectral density information estimating, for each audio input channel signal of the two or more audio input channel signals, power spectral density information in said audio input channel signal and the determining unit of filter (110) is configured to estimate the second power spectral density information by estimating, for each audio input channel signal of the two or more audio input channel signals, power spectral density information in ambient signal portions of said audio input channel signal, or wherein the filter determination unit (110) is configured to estimate the first power spectral density information by estimating, for each audio input channel signal, the two or more signals. audio input channel, power spectral density information in said audio input channel signal and the filter determination unit (110) is configured to estimate the second power spectral density information by estimating, for each audio input channel signal of the two or more audio input channel signals, power spectral density information in the direct signal portions of said channel signal audio input; or wherein the filter determination unit (110) is configured to estimate the first power spectral density information by estimating, for each audio input channel signal of the two or more audio input channel signals, spectral density information of power in the direct signal portions of said audio input channel signal and the filter determination unit (110) is configured to estimate the second power spectral density information by estimating, for each audio input channel signal of the two or more audio input channel signals, power spectral density information in the ambient signal portions of said audio input channel signal.
[0002]
Apparatus according to claim 1, characterized in that the apparatus further comprises an analysis filterbank (605) for transforming the two or more audio input channel signals from a time domain into a frequency domain of the time, in which the filter determination unit (110) is configured to determine the filter by estimating the first power spectral density information and the second power spectral density information depending on the audio input channel signals, which are represented in the time frequency domain, where the signal processor (120) is configured to create one or more audio output channel signals, which are represented in a time frequency domain by applying the filter in two or further audio input channel signals, which are represented in the time frequency domain, and wherein the apparatus further comprises a synthesis bank filter (625) for transforming one or more signals. is of the audio output channel, which are represented in a time frequency domain, of the time frequency domain in the time domain.
[0003]
Apparatus according to claim 1 or 2, characterized in that the filter determination unit (110) is configured to determine the filter by estimating the first power spectral density information by estimating the second density information spectral power, and by determining the audio input channel signal information (βi,, βj) depending on at least one of the two or more audio input channel signals.
[0004]
Apparatus according to claim 3, characterized in that the filter determination unit (110) is configured to determine the audio input channel signal information (βi,, βj) depending on whether a transient is present in at least one. minus one of the two or more signals on the audio input channel.
[0005]
Apparatus according to claim 3 or 4, characterized in that the filter determining unit (110) is configured to determine the audio input channel signal information (βi, βj) depending on a presence of additive noise on at least one signal channel, through which one of the two or more signals of the audio input channel is transmitted.
[0006]
Apparatus according to any one of claims 3 to 5, characterized in that the filter determination unit (110) is configured to determine power spectral density information on the two or more audio input channel signals. depending on a first matrix (Φy ), the first matrix (Φy ) comprising an estimate of the power spectral density for each channel signal of the two or more audio input channel signals on the main diagonal of the first matrix (Φy ), and be configured to determine the power spectral density information about the ambient signal parts of the two or more audio input channel signals depending on the second matrix (Φa) or depending on an inverse matrix (Φ“1) of the second matrix ( Φa), the second matrix (Φ ) comprising an estimate of the power spectral density for the ambient signal parts of each channel signal of the two or more audio input channel signals in the diagon main al of the second matrix (Φ ), or wherein the filter determination unit (110) is configured to determine the power spectral density information on the two or more audio input channel signals depending on the first matrix ( Φy ) , and be configured to determine the power spectral density information about the direct signal parts of the two or more audio input channel signals depending on a third matrix (Φ ) or depending on an inverse matrix (Φ“1 ) of the third matrix (Φd ), the third matrix (Φ ) comprising an estimate of the power spectral density for the direct signal parts of each channel signal of the two or more audio input channel signals on the main diagonal of the third matrix (Φ ), or wherein the filter determination unit (110) is configured to determine the power spectral density information about the ambient signal portions of the two or more audio input channel signals. o depending on the second matrix (Φa) or depending on an inverse matrix (Φ“1 ) of the second matrix (Φa), and be configured to determine the power spectral density information on the direct signal parts of the two or more signals of the audio input channel depending on the third matrix (Φ ) or depending on an inverse matrix (Φ“1 ) of the third matrix ( Φd).
[0007]
Apparatus according to claim 6, characterized in that the filter determination unit (110) is configured to determine the first matrix (Φy ) for determining the power spectral density information on the two or more signals of the channel. audio input, and be configured to determine the second matrix (Φa) or an inverse matrix (Φ“1) of the second matrix (Φa) to determine the power spectral density information about the ambient signal parts of the two or more signals of the audio input channel, or wherein the filter determining unit (110) is configured to determine the first matrix (Φy ) for determining power spectral density information about the two or more signals of the audio input channel, and be configured to determine the third matrix (Φ ) or an inverse matrix (Φ“1 ) of the third matrix (Φd ) to determine the power spectral density information about the direct signal parts of the two or more signals from the audio input channel, or wherein the filter determination unit (110) is configured to determine the second matrix (Φa) or an inverse matrix (Φ“1) of the second matrix (Φa) to determine the information of the power spectral density over the ambient signal parts of the two or more audio input channel signals, and be configured to determine the third matrix (Φ ) or an inverse matrix (Φ“1 ) of the third matrix (Φd ) to determine the power spectral density information about the direct signal portions of the two or more audio input channel signals.
[0008]
Apparatus according to claim 6 or 7, characterized in that the filter determination unit (110) is configured to determine the filter HD (βi) depending on the formula
[0009]
Apparatus according to any one of claims 3 to 8, characterized in that the filter determining unit (110) is configured to determine an input channel signal parameter (βi,, βj) for each of the two or plus audio input channel signals as audio input channel channel signal information (βi,, βj), where audio input channel signal parameter (βi,, βj) of each of the signals the audio input channel depends on that audio input channel signal.
[0010]
Apparatus according to claim 8, characterized in that the filter determining unit (110) is configured to determine an audio input channel signal parameter (βi,, βj) for each of the two or more signals. of the audio input channel as audio input channel signal information (βi,, βj), so that for each pair of a first audio input channel signal of the audio input channel signals and another second audio input channel signal from audio input channel signals
[0011]
Apparatus according to claim 8 or 10, characterized in that the filter determination unit (110) is configured to determine the second matrix Φ according to the formula
[0012]
Apparatus according to claim 11, characterized in that the filter determining unit (110) is configured to determine ΦA depending on the two or more signals of the audio input channel.
[0013]
Apparatus according to any one of claims 1 to 7, characterized in that the filter determination unit (110) is configured to determine an HD intermediate filter matrix by estimating direct signal components of the two or more signals. audio input channel, by estimating a first power spectral density information and by estimating the second power spectral density information, and wherein the filter determining unit (110) is configured to determine the HD filter depending on the matrix of the intermediate filter H according to the formula H D = I - G + GH D , where I is a unit matrix, and where G is a diagonal matrix, where the signal processor (120) is configured to create one ~or more audio output channel signals by applying the HD filter to the two or more audio input channel signals.
[0014]
14. Method for creating one or more audio output channel signals depending on two or more audio input channel signals, wherein each of the two or more audio input channel signals comprises direct signal parts and parts of ambient signal, characterized in that the method comprises: determining a filter by estimating a first power spectral density information and estimating a second power spectral density information, wherein the filter depends on the first spectral density information of power and the second power spectral density information, and create one or more audio output channel signals by applying the filter to the two or more audio input channel signals, wherein the one or more audio output channel signals depend on the filter, in which the estimation of the first information of the power spectral density is conducted by estimation, for each audio input channel signal of the two or more signals of the audio input channel, the power spectral density information in said audio input channel signal and the estimation of the second power spectral density information are performed by estimation, for each audio input channel signal of the two or further audio input channel signals, power spectral density information in the ambient signal portions of said audio input channel signal or wherein the estimation of the first power spectral density information is conducted by estimating, for each channel signal of the two or more audio input channel signals, the power spectral density information in said audio input channel signal and the estimation of the second power spectral density information is conducted by estimating, for each signal. of the audio input channel of the two or more audio input channel signals, power spectral density information in the portions of direct signal of said audio input channel signal, or wherein estimation of the first power spectral density information is conducted by estimating, for each audio input channel signal of the two or more audio input channel signals, power spectral density information on the direct signal portions of said audio input channel signal and estimating the second power spectral density information is conducted by estimating, for each audio input channel signal of the two or more channel signals input audio, power spectral density information in the ambient signal portions of said audio input channel signal.

类似技术:

公开号 | 公开日 | 专利标题

BR112015021520B1|2021-07-13|APPARATUS AND METHOD FOR CREATING ONE OR MORE AUDIO OUTPUT CHANNEL SIGNALS DEPENDING ON TWO OR MORE AUDIO INPUT CHANNEL SIGNALS

US20200152210A1|2020-05-14|Determining the inter-channel time difference of a multi-channel audio signal

BR112013014173B1|2021-07-20|APPARATUS AND METHOD FOR DECOMPOSITING AN INPUT SIGNAL USING A PRE-CALCULATED REFERENCE CURVE

US9449603B2|2016-09-20|Multi-channel audio encoder and method for encoding a multi-channel audio signal

BRPI0816638B1|2020-03-10|DEVICE AND METHOD FOR MULTI-CHANNEL SIGNAL GENERATION INCLUDING VOICE SIGNAL PROCESSING

ES2552996T3|2015-12-03|Method and apparatus for decomposing a stereo recording using frequency domain processing using a spectral weighting generator

同族专利:

公开号 | 公开日

HK1219378A1|2017-03-31|

MX2015011570A|2015-12-09|

WO2014135235A1|2014-09-12|

CN105409247A|2016-03-16|

EP2965540B1|2019-05-22|

TW201444383A|2014-11-16|

RU2015141871A|2017-04-07|

EP2965540A1|2016-01-13|

MY179136A|2020-10-28|

JP6637014B2|2020-01-29|

CN105409247B|2020-12-29|

SG11201507066PA|2015-10-29|

RU2650026C2|2018-04-06|

CA2903900A1|2014-09-12|

BR112015021520A2|2017-08-22|

PL2965540T3|2019-11-29|

MX354633B|2018-03-14|

JP2016513814A|2016-05-16|

KR20150132223A|2015-11-25|

CA2903900C|2018-06-05|

JP2018036666A|2018-03-08|

US10395660B2|2019-08-27|

AU2013380608A1|2015-10-29|

KR101984115B1|2019-05-31|

AR095026A1|2015-09-16|

TWI639347B|2018-10-21|

US20150380002A1|2015-12-31|

ES2742853T3|2020-02-17|

JP6385376B2|2018-09-05|

AU2013380608B2|2017-04-20|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US8345890B2|2006-01-05|2013-01-01|Audience, Inc.|System and method for utilizing inter-microphone level differences for speech enhancement|

US8036767B2|2006-09-20|2011-10-11|Harman International Industries, Incorporated|System for extracting and changing the reverberant content of an audio input signal|

DE102006050068B4|2006-10-24|2010-11-11|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating an environmental signal from an audio signal, apparatus and method for deriving a multi-channel audio signal from an audio signal and computer program|

WO2008126347A1|2007-03-16|2008-10-23|Panasonic Corporation|Voice analysis device, voice analysis method, voice analysis program, and system integration circuit|

CN101816191B|2007-09-26|2014-09-17|弗劳恩霍夫应用研究促进协会|Apparatus and method for extracting an ambient signal|

DE102007048973B4|2007-10-12|2010-11-18|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for generating a multi-channel signal with voice signal processing|

JP5508550B2|2010-02-24|2014-06-04|フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン|Apparatus for generating extended downmix signal, method and computer program for generating extended downmix signal|

PL2965540T3|2013-03-05|2019-11-29|Fraunhofer Ges Forschung|Apparatus and method for multichannel direct-ambient decomposition for audio signal processing|PL2965540T3|2013-03-05|2019-11-29|Fraunhofer Ges Forschung|Apparatus and method for multichannel direct-ambient decomposition for audio signal processing|

US20140355769A1|2013-05-29|2014-12-04|Qualcomm Incorporated|Energy preservation for decomposed representations of a sound field|

US9466305B2|2013-05-29|2016-10-11|Qualcomm Incorporated|Performing positional analysis to code spherical harmonic coefficients|

US9502045B2|2014-01-30|2016-11-22|Qualcomm Incorporated|Coding independent frames of ambient higher-order ambisonic coefficients|

US9922656B2|2014-01-30|2018-03-20|Qualcomm Incorporated|Transitioning of ambient higher-order ambisonic coefficients|

US9620137B2|2014-05-16|2017-04-11|Qualcomm Incorporated|Determining between scalar and vector quantization in higher order ambisonic coefficients|

US9852737B2|2014-05-16|2017-12-26|Qualcomm Incorporated|Coding vectors decomposed from higher-order ambisonics audio signals|

US10770087B2|2014-05-16|2020-09-08|Qualcomm Incorporated|Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals|

US9747910B2|2014-09-26|2017-08-29|Qualcomm Incorporated|Switching between predictive and non-predictive quantization techniques in a higher order ambisonicsframework|

CN105992120B|2015-02-09|2019-12-31|杜比实验室特许公司|Upmixing of audio signals|

EP3067885A1|2015-03-09|2016-09-14|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for encoding or decoding a multi-channel signal|

CA2979598C|2015-03-27|2020-08-18|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Apparatus and method for processing stereo signals for reproduction in cars to achieve individual three-dimensional sound by frontal loudspeakers|

CN106297813A|2015-05-28|2017-01-04|杜比实验室特许公司|The audio analysis separated and process|

US10448188B2|2015-09-30|2019-10-15|Dolby Laboratories Licensing Corporation|Method and apparatus for generating 3D audio content from two-channel stereo content|

US9930466B2|2015-12-21|2018-03-27|Thomson Licensing|Method and apparatus for processing audio content|

CN106412792B|2016-09-05|2018-10-30|上海艺瓣文化传播有限公司|The system and method that spatialization is handled and synthesized is re-started to former stereo file|

GB201716522D0|2017-10-09|2017-11-22|Nokia Technologies Oy|Audio signal rendering|

EP3573058B1|2018-05-23|2021-02-24|Harman Becker Automotive Systems GmbH|Dry sound and ambient sound separation|

US10796704B2|2018-08-17|2020-10-06|Dts, Inc.|Spatial audio signal decoder|

WO2020037282A1|2018-08-17|2020-02-20|Dts, Inc.|Spatial audio signal encoder|

CN109036455B|2018-09-17|2020-11-06|中科上声（苏州）电子有限公司|Direct sound and background sound extraction method, loudspeaker system and sound reproduction method thereof|

WO2020247033A1|2019-06-06|2020-12-10|Dts, Inc.|Hybrid spatial audio decoder|

DE102020108958A1|2020-03-31|2021-09-30|Harman Becker Automotive Systems Gmbh|Method for presenting a first audio signal while a second audio signal is being presented|

法律状态:
2018-11-21| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2020-02-04| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2021-05-04| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2021-07-13| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 23/10/2013, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201361772708P| true| 2013-03-05|2013-03-05|

US61/772,708|2013-03-05|

PCT/EP2013/072170|WO2014135235A1|2013-03-05|2013-10-23|Apparatus and method for multichannel direct-ambient decomposition for audio signal processing|

[返回顶部]